\title{Variational Inference}

\subsection{Variational Inference}

Variational inference is an umbrella term for algorithms which cast
posterior inference as optimization
\citep{hinton1993keeping,waterhouse1996bayesian,jordan1999introduction}.

The core idea involves two steps:
\begin{enumerate}
   \item posit a family of distributions $q(\mathbf{z}\;;\;\lambda)$
   over the latent variables;
   \item match $q(\mathbf{z}\;;\;\lambda)$ to the posterior by
   optimizing over its parameters $\lambda$.
 \end{enumerate}
This strategy converts the problem of computing the posterior
$p(\mathbf{z} \mid \mathbf{x})$ into an optimization problem:
minimize a divergence measure
\begin{align*}
  \lambda^*
  &=
  \arg\min_\lambda \text{divergence}(
  p(\mathbf{z} \mid \mathbf{x})
  ,
  q(\mathbf{z}\;;\;\lambda)
  ).
\end{align*}
The optimized distribution $q(\mathbf{z}\;;\;\lambda^*)$ is used as
a proxy to the posterior $p(\mathbf{z}\mid \mathbf{x})$.

Edward takes the perspective that the posterior is (typically)
intractable, and thus we must build a model of latent variables that
best approximates the posterior.
It is analogous to the perspective
that the true data generating process is unknown, and thus we build
models of data to best approximate the true process.

For details on variational inference classes defined in Edward,
see the \href{/api/inference}{inference API}.
For background on specific variational inference algorithms in
Edward, see the other inference \href{/tutorials/}{tutorials}.

\subsubsection{References}\label{references}

